Disambiguating Entities Referred by Web Endpoints using Tree Ensembles

نویسندگان

  • Gitansh Khirbat
  • Jianzhong Qi
  • Rui Zhang
چکیده

This paper describes system details and results of team “EOF” from the University of Melbourne in the shared task of ALTA 2016, which addresses the use of cross document coreference resolution to determine whether two URLs refer to the same underlying entity. In our submission, we develop a two stage system which first identifies the underlying entity for a given URL using entity-level features by ranking the entity mentions present in the crawled text with the help of logistic regression. This is followed by disambiguating entities present in the given pair of URLs using a tree ensemble model to classify if both URLs refer to the same underlying entity. Our system achieved a final F1-score of 86.02% on the private leaderboard1, which is the best score among all the participating systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Effectiveness of Ensembles of Decision Trees in Disambiguating Senseval Lexical Samples

This paper presents an evaluation of an ensemble–based system that participated in the English and Spanish lexical sample tasks of SENSEVAL-2. The system combines decision trees of unigrams, bigrams, and co–occurrences into a single classifier. The analysis is extended to include the SENSEVAL-1 data.

متن کامل

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text

Precisely identifying entities in web documents is essential for document indexing, web search and data integration. Entity disambiguation is the challenge of determining the correct entity out of various candidate entities. Our novel method utilizes background knowledge in the form of a populated ontology. Additionally, it does not rely on the existence of any structure in a document or the ap...

متن کامل

Disambiguating Descriptions: Mapping Digital Special Collections Metadata into Linked Open Data Formats

In this poster we describe the Linked Open Data (LOD) for Digital Special Collections project at the University of Illinois at Urbana-Champaign and describe some of the particular challenges that legacy metadata poses for representation in LOD formats. LOD formats are primarily based on the World Wide Web Consortium’s Resource Description Framework standard which demands both that entities be n...

متن کامل

Automatic QoS-aware Web Services Composition based on Set-Cover Problem

By definition, web-services composition works on developing merely optimum coordination among a number of available web-services to provide a new composed web-service intended to satisfy some users requirements for which a single web service is not (good) enough. In this article, the formulation of the automatic web-services composition is proposed as several set-cover problems and an approxima...

متن کامل

A Maximum Entropy Approach To Disambiguating VerbNet Classes

This paper focuses on verb sense disambiguation cast as inferring the VerbNet class to which a verb belongs. To train three different supervised learning models –Maximum Entropy (MaxEnt), Naive Bayes and Decision Tree– we used lexical, co-occurrence and typed-dependency features. For each model, we built three classifiers: one single classifier for all verbs, one single classifier for polysemou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016